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Abstract 

We present a meta-analysis of independent studies on the potential implication in the occur¬ 
rence of coronary heart disease (CHD) of the single-nucleotide polymorphism (SNP) at the -308 
position of the tumor necrosis factor alpha (TNF-a) gene. We use Bayesian analysis to integrate 
independent data sets and to infer statistically robust measurements of correlation. Bayesian hy¬ 
pothesis testing indicates that there is no preference for the hypothesis that the -308 TNF-a SNP 
is related to the occurrence of CHD, in the Caucasian or in the Asian population, over the null 
hypothesis. As a measure of correlation, we use the probability of occurrence of CHD conditional 
on the presence of the SNP, derived as the posterior probability of the Bayesian meta-analysis. The 
conditional probability indicates that CHD is not more likely to occur when the SNP is present, 
which suggests that the -308 TNF-a SNP is not implicated in the occurrence of CHD. 


^ Email: cscarvalho@oal.ul.pt 


1 



I. INTRODUCTION 


Coronary heart disease (CHD) is now widely accepted to consist of a chronic inflamma¬ 
tory disease [1]. CHD is a complex disease with multifold etiology, with both genetic and 
environmental factors contributing to its occurrence and development. 

Among the genetic factors potentially implicated in the emergence of CHD, the tumor 
necrosis factor alpha (TNF-a) has attracted a great interest for its involvement in the 
inflammatory response of the immune system [2]. There is evidence that TNF-a is implicated 
in an increased susceptibility to the pathogenesis of a variety of diseases. In particular, high 
serum levels of TNF-a affect endothelial cell hemostatic function and hence may modify 
the risk for developing CHD [3]. There is also the suggestion that the TNF-a gene affects 
the modulation of lipid metabolism, obesity susceptibility and insulin resistance, thus being 
potentially implicated in the development of CHD (see Ref. [1] and references therein). 

Among the several single-nucleotide polymorphisms (SNPs) that have been identihed in 
the human TNF-a, the best documented one is at the position -308 of the TNF-a gene 
promoter. This SNP involves the substitution of guanine (G) for adenine (A) and the 
subsequent creation of two alleles (TNFl(A) and TNF2(G)) and three genotypes (GG, GA 
and AA) [5] . It has been hypothesised that the TNF-a SNP could change the susceptibility 
to GHD. However, the results on its association with GHD are contradictory, some implying 
different influence of the two alleles on the prevalence of GHD, others implying no association 
(see Ref. [6] and references therein). 

In order to infer the risk of GHD derived from potential risk factors, it is important to 
develop a formalism that infers correlations among different intervening factors and combines 
independent data sets for a consistent inference of the correlations. In Ref. |H] we introduced 
a formalism based on Bayesian inference to infer the correlation of the occurrence of GHD 
with two risk factors and tested a simplistic model for the signal pathway on the three- 
variable data set from Ref. jl]. In this manuscript we extend the formalism to extract 
information from the combination of data from independent studies and to quantify the 
combined risk of occurrence of GHD from the -308 TNF-a SNP. 

The most exhaustive meta-analysis to date on this correlation is the frequentist analysis 
in Ref. [B] covering Gaucasian, Asian, Indian and African populations. This meta-analysis 
found a 1.5 fold increased risk of developing GHD when the SNP is present in the Gaucasian 
population, but found no association in the other ethnicities. A more recent meta-analysis, 
covering the same data sets, found no association in the Gaucasian or in the Asian population 

ra¬ 


in this manuscript we propose a meta-analysis based on Bayesian analysis in an attempt 
to establish the potential implication of -308 TNF-a SNP in the occurrence of GHD. This 
manuscript is organized as follows. In Section [IT] we describe the method. In particular, 
in Subsection H A| we describe the data sets selected; in Subsection H B we propose two 
hypotheses and test which best and most simply describes the data. In Section ITTT 


we 
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perform the Bayesian analysis of the selected data sets, combined by ethnicity and CHD 
phenotype, and present the resnlts. In particnlar, in Snbsection III A we infer the conditional 


probabilities for the occnrrence of CHD given the presence of the SNP; in Snbsection IIIB 


we test the sensitivity of this formalism to low-signihcance data sets, to data sets with 
extreme resnlts and to extreme data sets. Finally in Section IV we draw the conclnsions. 
Below there follows a flow chart describing snmmarily the reasoning of this meta-analysis 
(Fig.s and|^. 



HYPOTHESES TESTING 



Eigure 1. Flow Chart. Panel 1 of 3. Ellipses indicate the main actions. Rectangles indicate 
detailed actions. Rectangles with rounded corners indicate the main results. 
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INFERENCE OF CONDI¬ 
TIONAL PROBABILITIES 



Figure 2. Flow Chart. Panel 2 of 3. 


II. METHODS 

A. Data Selection 

This analysis is based on twenty data sets (indexed i) on two CHD phenotypes (indexed j) 
selected from the studies compiled in Ref. [6] , following a well-documented study identifica¬ 
tion, data acquisition and selection strategy, including also statistical tests (Hardy-Weinberg 
equilibrium, heterogeneity, publication bias). The selected data sets are the studies that re¬ 
port the genotypes of both CHD patients and non-CHD (control) patients for the two CHD 
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SENSITIVITY OF RESULTS 



Figure 3. Flow Chart. Panel 3 of 3. 

phenotypes separately. In particular, there were included: hfteen data sets from studies on 
Caucasians, where six studies are on the CHD phenotype coronary stenosis (CS) and 

nine studies are on the CHD phenotype myocardial infarction (MI) [T5H22]; and hve data 
sets from studies on Asians on the CHD phenotype coronary stenosis [231127] . The rejected 
data sets are: three studies on Caucasians (for not reporting data on non-CHD patients); 
four studies on Asians (three for not reporting data on non-CHD patients and one for not 
separating the CHD phenotypes); the study on Indians and the study on Africans (both for 
not separating the CHD phenotypes). 

The data consist of frequencies of occurrence of the -308 TNF-a SNP in randomly 
selected CHD patients and non-CHD (control) patients, respectively nsNp,cHD and ngj,jp 
The data are summarized in Table (columns 3-6). The errors indicated were computed 
from error propagation. Assuming that the methods for measuring the presence of the SNP 
have a success rate of r^uc = 0.88 [21], and furthermore that the error of a counting result 
is given by the Poisson approximation ^/n, then the error of a counting result n on the 
presence of the SNP is given by (1 — rsuc)\/n/2. 


1. Data heterogeneity 

In order to investigate the heterogeneity in the data sets, we compare the size of the effect 
(defined as a measure of the difference between CHD and non-CHD patients) in each study 
|28j . As a measure of the size of the effect, we use the fraction of SNP in the population 
of CHD patients and in the population of non-CHD patients, respectively /sNPinCHO = 

'^SNP,CHd/'^CHD and /sNPinCHD ~ ^SNP,CHD/^CHD) where UchD = nsNP.CHD + ’^SNP,CHD 

the total number of CHD patients and Uqjjj = is the total number 

of non-CHD patients. Moreover, the ratio of these two fractions gives an indication of the 
correlation sign. Hence, if /sNPinCHD//sNPinCHD > proportionally more frequent 

in CHD than in non-CHD patients, hence the study favours a positive correlation between 
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Study 

Phenotype 

CHD patients 

Controls 

Bayes factor 

(^) 

(i) 

GG 

GA/AA 

GG 

GA/AA 



Allen et al. (NA) 


127 

53 

222 

107 

0.14 ±0.05 


Elahi et al. (A) 


59 

38 

41 

54 

3.54 ± 1.12 

0.049 ±0.014 

Georges et al. (A) 


613 

236 

222 

92 

0.08 ±0.03 


Cauc CS 

175 

73 

185 

56 

0.33 ±0.11 

0.041 ±0.016* 

Sbarsi et al. (A) 

Szalai et al. (A) 


229 

89 

181 

87 

0.19 ±0.07 

0.048 ±0.019** 

Vendrell et al. (A) 


231 

no 

159 

48 

1.33 ±0.46 


Antonicelli (A) 


224 

69 

246 

64 

0.12 ±0.04 


Bennet et al. (A) 


799 

368 

1037 

460 

0.05 ±0.02 


Dedoussis et al. (A) 


206 

31 

227 

10 

26.14 ±8.56 

0.026 ±0.011 

Herrmann et al.l (NA) 


325 

120 

376 

158 

0.11 ±0.04 


Herrmann et al.l (NA) 

Cauc MI 

117 

79 

97 

79 

0.19 ±0.06 

0.035 ±0.015* 

Koch et al. (NA) 


565 

228 

244 

96 

0.07 ±0.03 

0.030 ±0.012** 

Padovani et al. (A) 


120 

28 

114 

34 

0.17 ±0.06 


Tobin et al. (A) 


365 

182 

337 

168 

0.07 ±0.03 


Tulyakova et al. (NA) 


242 

64 

177 

69 

0.60 ±0.21 


Chen et al. (NA) 


29 

11 

21 

9 

0.27 ±0.08 

0.151 ±0.057 

Hou et al. (NA) 


268 

32 

802 

103 

0.05 ±0.02 


Li et al. (NA) 

Asian CS 

66 

8 

138 

20 

0.12 ±0.04 

0.114 ±0.043* 

Liu et al. (A) 


234 

52 

142 

34 

0.10 ±0.03 

0.103 ±0.037** 

Shun et al. (A) 


54 

19 

118 

20 

1.10 ±0.34 



Table I. Data sets and results of hypothesis testing. Column 1: Studies selected for the 
meta-analysis. The index (A) indicates that a possible association was measured in the original 
publication; the index (NA) indicates that no association was measured in the original publication. 
Column 2: The phenotype of the patients in the studies grouped by ethnicity. Columns 3-6: 
Genotypic frequencies of TNF0-308 in CHD patients and non-CHD (control) patients from twenty 
studies (indexed i) and for two CHD phenotypes (indexed j), namely coronary stenosis (CS) and 
myocardial infarction (MI). Columns 7-8: The Bayes factors for the hypotheses considered, for 
each data set /Hq^), and for the meta-data set of each CHD phenotype (HI/Hq). ^ French 
cohort. ^ Irish cohort. * Excluding Elahi et ah, Dedoussis et al. and Chen et ah, respectively for 
each phenotype. ** Excluding Georges et ah, Bennet et al. and Hou et ah, respectively for each 
phenotype. 


the presence of the SNP and the occurrence of CHD; if /sNPinCHD//sNPinCHD < 
is proportionally less frequent in CHD than in non-CHD patients, hence the study favours 
a negative correlation; if /sNPinCHD//sNPinCHD = 1) is equally frequent in CHD and 

in non-CHD patients, hence the study favours no correlation. 

We plot this ratio of fractions for each study, grouped by ethnicity and CHD phenotype. 
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Figure 4. Funnel plot for the ratio of SNP fractions. The ratio of the fraction of SNP in 
the population of CHD patients to the fraction of SNP in the population of non-CHD patients, 
/sNPinCHD//sNPinCHD> each study, grouped by ethnicity and CHD phenotype. Top panel: 
Caucasians with coronary stenosis; Middle panel; Caucasians with myocardial infarction; Bottom 
panel: Asians with coronary stenosis. The solid horizontal line is the ratio of the combined data 
sets included in each panel. The dashed horizontal line marks the ratio equal to one. 


in Fig. 1^ We also plot the ratio for the combined data sets included in each panel. We 
observe that the ratio of the data sets are asymmetrical distributed about the ratio equal 
to one, showing a predominance of ratios smaller than one. The ratio of the combined data 
sets included in each panel is slightly smaller than one for the Caucasian studies (for both 
CHD phenotypes) and larger than one for the Asian studies. This asymmetry indicates 
heterogeneity in the studies, as also observed in the meta-analysis of Ref. [U]. 

In Fig. 1^ (top panel), we plot this ratio of fractions as a function of the sample size. We 
observe that smaller data sets are distributed across a wide range of values of this ratio, 
whereas larger data sets are distributed more closely to one. This implies that smaller 
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Figure 5. Scatter plots as a function of the sample size. Top panel: The ratio of the 
frequency of SNP in the CHD population to the frequency of SNP in the non-CHD population as 
a function of the sample size. Bottom panel: The Bayes factor for the two hypotheses discussed in 
the text as a function of the sample size. 


data sets favour either positive or negative correlation, whereas larger data sets favour no 
correlation. 


B. Hypotheses Testing 


First we test the hypothesis Hi that the presence of TNF-o; SNP is related to the occur¬ 
rence of CHD against the null hypothesis Hq that the presence of the SNP is unrelated to 
the occurrence of CHD. By the Bayes theorem, the probability of a hypothesis given the 
data Dsnp is the posterior probability of the corresponding hypothesis 


P{Hi\Ds-!<(p) 


pjPsNpmpjH,) 

PiDsf^p) 


( 1 ) 


where P(DsNp|hfi) is the evidence, P{Hi) is the prior probability of Hi and P(Dsnp) = 
Y^^P{Ds^p\Hi)P{Hi). The subscript in Psnp reminds us that the random variable is the 
occurrence of the SNP. In order to infer which hypothesis is more likely in view of the data, 
we compare the evidence computed for the two hypotheses. The evidence is the integral of 
the likelihood over the j-dimensional parameter space of the hypothesis Hi 


P(PSNP|P*) = 


Ppij P(PsNp|Pij, Hi)Pij)k,n\Hi)- 


( 2 ) 








Assuming equal prior probabilities for the two hypotheses, then from Eqn. Q it follows that 

P(/fl|DsNp) ^ P(PsNp|Pl) , . 

P{Ho\Dsnp) P(Psnp|Po)‘ ^ ^ 

We compute the evidence of the two hypotheses, for each data set separately and for the 
combined data sets grouped by CHD phenotype. We follow the procedure detailed in Ref [H], 
which we here summarize for one data set and then generalize for the combined data sets. In 
all cases, we choose a uniform distribution for the prior of the parameters, which is justihed 
by the absence of an a priori bias on the values of the parameters |29] . 

The evidence of Hq, is computed assuming that the presence of the SNP 

is described by a binomial distribution with one parameter only, namely the probability po 
that the SNP occurs in a given population. For usnp occurrences of the SNP and non¬ 
occurrences of the SNP in a sample of size n = ugNP + ^^snp’ likelihood P(PsNpbo) Hq) 
is given by 


P(PsNp|po, Ho) = (4) 

Moreover, assuming a uniform prior distribution for po, P{po) = 1, we hnd that 

P(PsNp|i^o) = f'dpo P{Ds^p\po,Ho)P{po\Ho) = ^snp! W 

Jo WSNP + ngj^+ 1 )! 

where n\ stands for the factorial of n. 

The evidence of Hi, P(PsNp|hfi), is computed assuming that the presence of the SNP 
is described by a binomial distribution with two parameters, namely the probability pifiHO 
that the SNP occurs in the subset of CHD patients and the probability p^ Qjjjj that the SNP 
occurs in the subset of non-CHD patients, 

dpi^CHD / dpiQiij) 

Jo 

X PiDsi<sp\Pi,CHD,Pi^miS^ Hi)P{pi^chd,Pi^cWd\Hi)- (6) 

For usnp.chd occurrences of the SNP and non-occurrences of the SNP in a subset 

of CHD patients uchd = ’^snp.chd + ’^smpchdi ^^*4 also for Ugj.,fp( 3 |jj 5 occurrences of the 
SNP and ’^gj^cuD non-occurrences of the SNP in a subset of non-CHD patients = 
^SArpcHD + ’^SNP,CHD! t^ie likelihood P{DsNp\pi,CHD,Pi^mB^ Hi) is separable, i.e. it can be 
decomposed into the product of the likelihoods P{Dsp!p\pi,chd, Hi) and P(DgNp|Pi chd; -^i); 
as follows 


P(Dsnp|Pi)= [ 
Jo 


P(PsNp|Pl,CHD,Pi,CHD)-^l) — Pi, CHD™ (4 “ 

= -P(-DsNp|Pi,CHD) hfi)P(PgNp|Pi CHD, 


(7) 
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Assuming a uniform probability for Pi,chd and -P(pi,chd, Pi chdI-^i) =1 moreover 

that the priors on pi,cHD and p^ are separable, the posterior distribution will also be 
separable and given by 


-P(-DsNpl-ffi) — / dpi^cHD-P(-D snp|pi,chd,- ffl)-P(Pl,CHD|-ffl) 


X / P(Zi)sNp|Pi,CHD) 

Jo 


nsNP.CHD! 


Tic 


SNP.CHD- ^SNP,CHD- 


(nsNP,CHD + n.sj<fp cjjD + 1)! (ng^p + ngjv^p p-pp + 1)! 


( 8 ) 


In order to compare the hypotheses, we take the ratio of the corresponding evidences, 
Biq = P{Hi\D)/P{H q\D), which we present in Table (columns 7-8). This quantity is 
known as the Bayes factor and gives empirical levels of signihcance for the strength of 
the evidence of the test hypothesis over that of the null hypothesis. It also encapsulates 
the Occam’s factor, which measures the adequacy of a hypothesis to the data over the 
parameter space of the hypothesis [22] • The levels of signihcance ascribed to the Bayes 
factor are calibrated by the Jeffrey’s scale [30|. According to this scale, a Bayes factor larger 
than one indicates that Hi is favoured over Ho. Otherwise, Ho is favoured over Hi. For 
the data sets taken separately, the results from this hypothesis test mostly agree with the 
corresponding results presented in the meta-analysis by Chu et al. (see Fig. 1 of Ref. ID)- 

We plot the Bayes factor for each study, grouped by ethnicity and CHD phenotype, in 
Fig.§ For the data sets taken separately, we observe that the Bayes factor is asymmetrically 
distributed about the Bayes factor equal to one, with most Bayes factors being smaller than 
one. The exceptions are Elahi et al. [in], Vendrell et al. |9| and Dedoussis et al. [I7| for 
the Caucasian population, and Shun et al. [2Z] for the Asian population. This asymmetry 
indicates heterogeneity in the results. For the combined data sets included in each panel, 
the Bayes factor takes values 0.03 — 0.05 for the Caucasian population and 0.15 for the 
Asian population, which indicates that there is no evidence for Hi over Hq. We also observe 
that, for the Caucasian population, the Bayes factor of the combined data sets is outside the 
range of variability of the Bayes factor of the data sets considered separately. This suggests 
that the combination of the Caucasian data sets causes a new data pattern to emerge. 
Conversely the combination of the Asian data sets leads to an approximately average data 
pattern. Hence we conclude that the data favour Hq over Hi. Since Ho yields trivial results, 
in the subsequent subsections we present the results also for Hi to illustrate the application 
of the formalism to a more general setup. It is also instructive to compare the subsequent 
results using both hypotheses. 

In Fig. 1^ (bottom panel), we plot the Bayes factor as a function of the sample size. We 
observe that smaller data sets are distributed across a wide range of values of the Bayes 
factor, whereas larger data sets are distributed across values smaller than one. This implies 
that smaller data sets favour either Ho or Hi, whereas larger data sets favour Ho. 
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Figure 6. Funnel plot for the Bayes factor. The Bayes factor for each study, grouped 
by ethnicity and CHD phenotype. Top panel: Caucasians with coronary stenosis; Middle panel: 
Caucasians with myocardial infarction; Bottom panel: Asians with coronary stenosis. The solid 
horizontal line marks the average Bayes factor of the data sets included in each panel. The dashed 
horizontal line marks the Bayes factor equal to one. 


1. Correlation sign 

Comparing Fig.|^with Fig. we observe that, among the stndies with Bayes factor larger 
than one, Elahi et ah has a ratio /sNPinCHD//sNPinCHD < i-®- proportionally 

less freqnent in CHD than in non-CHD patients, which indicates a negative correlation 
between the presence SNP and the occnrrence of CHD. Another example of comparatively 
large Bayes factor and low ratio /sNPinCHD//sNPinCHD the stndy of Tnliakova et ah This 
indicates that the hypotheses as formulated do not distinguish the correlation sign. 

To further explore how the ratio /sNPinCHD//sNPinCHD affects the result of the hypothesis 
testing, we consider several realizations of CHD populations with the same uchd but with 
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Figure 7. Bayes factor as a function of the frequency of SNP in the CHD populations. 

The Bayes factor for several realizations of CHD populations with the same recHD but with different 
fractions of SNP, grouped by ethnicity and CHD phenotype. The realizations that correspond to 
a real combined data set are marked as red points. The dashed horizontal line marks the Bayes 
factor equal to one. Top panel: The Bayes factor as a function of /sNPinCHD//sNPinCHD- Bottom 
panel: The Bayes factor as a function of /sNPinCHD- 


different fractions of SNP. More specifically for each combined data set, we vary nsNP.cno 
while varying simultaneously ’^snpchd as to keep uchd constant. Throughout the dif¬ 
ferent realizations, the size of the control population is kept equal to the size of the control 
population of the combined data sets grouped by ethnicity and CHD phenotype. For each 
realization, we compute both /sNPinCHD (note that /sNPinCHD by construction kept fixed) 
and Hio, and plot the results in Fig. The realizations with the /sNPinCHD of a real com¬ 
bined data set are marked as red points. In the top panel, we plot Biq as a function of 
/sNPinCHD//sNPinCHD! from which there result three parabolae centred at the same point. In 
the bottom panel, for a better visualization of the behaviour of Biq, we plot Bio as a function 
of /sNPinCHD, from which there result three parabolae centred at different points. We observe 
that Bio follows a parabola, taking the minimum value when /sNPinCHD//sNPinCHD = ^ 
increasing in both directions with the increase of |/sNPinCHD//sNPinCHD ~ be. with the 
increase of the distance from 1. This confirms that the hypotheses as formulated do not dis¬ 
tinguish between a positive correlation of the SNP with CHD (/sNPinCHD//sNPinCHD > 1) 
a negative correlation (/sNPinCHD//sNPinCHD < !)• Hence, the value of /sNPinCHD//sNPinCHD 
complements the value of Bio in the characterization of the correlation. 
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III. RESULTS 


A. Inference of conditional probabilities 


1. Posterior probability for the occurrence of CHD 


We proceed to compute the probability for the occurrence of (CHD, i.e. given the data 
on the presence of the SNP, we determine the probability that a patient has CHD. This is 
dehned as the posterior probability 


P(cvm\p ffl P(Csnp|(CHD,//.)P(CHD) 

C(CHD|Dsnp,- fij) - -——-——-. 

-r l-L'sNPl-niJ 


(9) 


The prior probability P(CHD) is based on the available information on the occurrence of 
CHD. This probability can be computed by combining all the risk factors per age interval 
per pathology. According to the European guidelines, less than 4 in 1000 people have CS 
[52] . whereas about 1 in 1000 people have MI [33]. We then use P(CHD) = 0.004 for CS 
and P(CHD) = 0.001 for MI. 

The evidence P(PsNp|-Hj) can be decomposed as 


P(PsNp|i^0 = P(Psnp|(CHD,P,)P(CHD) + P(Psnp|CHD,P,)P(CHD). (10) 


In the case of Hq, 


P(Psnp|(CHD,Po)= f "" 

V^SNP/ 

P(Psnp|CHD,Po) = P(PsNp|i^o), (11) 

whereas in the case of Hi, 

P(DsnpI(CHD,Pi) = ( )prcHg”(l 

\’^SNP,CHD/ 

p(psnp|chd,po = f Vrara 

\^SNP,CBB/ 

In the previous section, we computed the evidence by marginalizing the parameters of each 
hypothesis. Here, assuming a hypothesis Hi and using the Bayes theorem, we compute the 
posterior probability of each parameter pjj given the data 


( 12 ) 


-P(Pijl-DsNp) 


-P(-PsNp|Pi,j)-P(Pi,il-Hi) 

-P(-DsNpl-Hi) 


(13) 


and hnd for pij the value that maximizes the likelihood P(PsNp|Pi,j)- In the case of Hq, 
we compute the posterior probability of the single parameter pjj = pq, where P(PsNpbo) 
is given by Eqn. (|^, P(PsNp|hfo) is given by Eqn. (|^ and P{pq\Hq) is assumed uniform. 
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Taking the derivative with respect to po and solving for (iP(po|-DsNp)/dpo = 0, we hnd for 
the maximum-likelihood value of po the value 


PO(maxL) i „ V 

insNP + ’^SNP J 

Similarly in the case of Hi, we compute the posterior probability of each of the two pa¬ 
rameters pij = {pi,chd,Pi,chd}) where P(-Dsnp|pi,chd) and P(Psnp|Pi,chd) are given by 
Eqn. 0, P(PsNpl-ffi) is given by Eqn. 0 and both P(pi,cHD|-f^i) and P(Pi,cHDl-f^i) are 
assumed uniform, Ending for the maximum-likelihood values of Pi,chd and p-^ Qfjjj respec¬ 
tively 


Pl,CHD(maxL) — 

^SNP,CHD 

(15) 

(^SNP,CHD + ^SNP,CHd) 

Pl,CHD(maxL) ~ 

’^SNP.CHD 

(16) 

(’^SNP,CHD + ’^SNP,CHd) 


Analogously we define the posterior probability 


P(CHD|Psnp,P*) = 


P(Psnp|CHD,P,)P(CHD) 

P{DsNp\Hi) 


(17) 


Finally, using the maximum-likelihood value of pij, we compute P(CHD|Psnp, Tfi) for 
the data sets combined, which we present in Table 

In the case of Hq, no information is added to the posterior probability, since by Eqn. 0 
the posterior probabilities equal the prior. Conversely in the case of Hi, information is added 
to the posterior probability, since by Eqn. (12) there result posterior probabilities different 
from the prior albeit compatible with the prior. 


2. Prediction of the presence of the SNP 

We now proceed to compute the probability for the presence of the SNP, i.e. given the 
data, we determine the probability that a randomly selected patient (with or without CHD) 
has the SNP. This probability is defined as 

F(nextSNP|T)sNP,h7i) = P(nextSNP|F)sNP, CHD)P(CHD|F)snp, 77*) 

+ P(nextSNP|71sNP,CHD)P(CHD|71sNP,^^*) 

= P(nextSNP,CHD|PsNP,77i) + P(nextSNP,CHD|PsNP,i^*). (18) 

In the case of Hq, 

P(nextSNP|PsNP,CHD) = P(nextSNP|PsNP, CHD) = pq, (19) 

whereas in the case of 77i, 


P(nextSNP|PsNp, CHD) — Pi,chd, 
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Hypothesis 

Probabilities 

Phenotype (j) 



Cane CS 

Cane MI 

Asian CS 


Po 

0.299 ±0.001 

0.284 ±0.001 

0.141 ±0.001 


P(CHD|Psnp,Po) 

(4.00 ±1.31) • 10-3 

(1.00 ±0.25) • 10-3 

(4.00 ±0.91) • 10-3 

Ho 

P(nextSNP, CPP|Psnp, Pq) 

(1.19 ±0.39) • 10-3 

(0.28 ±0.07) • 10-3 

(0.56 ±0.13) • 10-3 


P(nextSNP, CHD|Dsnp, Po) 

0.298 ± 1.093 

0.284 ± 1.752 

0.141 ±0.360 


P(nextSNP|DsNP,Po) 

0.299 ± 1.093 

0.284 ± 1.572 

0.141 ±0.360 


rnextSNP,CHD 

(4.00 ± 14.65) • 10-3 

(1.00 ±5.54) • 10-3 

(4.00 ± 10.22) • 10-3 


Pi,CHD 

0.295 ±0.001 

0.283 ±0.001 

0.158 ±0.001 


Pi,CHD 

0.305 ±0.001 

0.285 ±0.001 

0.132 ±0.001 

Hi 

P(CHD|Dsnp,Pi) 

(3.42 ± 7.94) • 10-3 

(0.98 ±3.26) • 10-3 

(5.00 ± 7.02) • 10-3 

P(nextSNP, CPP|Psnp, Pi) 

(1.00 ±2.34) • 10-3 

(0.28 ±0.92) • 10-3 

(0.79 ± 1.11) • 10-3 


P(nextSNP, CHD|Dsnp, Pi) 

0.304 ±0.598 

0.285 ±0.926 

0.131 ±0.244 


P(nextSNP|DsNP,Pi) 

0.305 ±0.598 

0.285 ±0.926 

0.132 ±0.244 


rnextSNP,CHD 

(3.30 ± 10.02) • 10-3 

(0.98 ±4.54) • 10-3 

(6.00 ± 13.84) • 10-3 


Table II. Probabilities inferred from the combined data sets. To each hypothesis there 
correspond several rows consisting of : a) the parameters pij given by the maximnm-likelihood 
valnes, in particular, po (hence one row) in the case of Hq, pi,chd and p^ (hence two rows) in 
the case of Hi] b) the posterior probability for the occurrence of (CHD, P(CHD|L)snP) Hi) (hence 
one row for each hypothesis); c) the predicted probabilities for the presence of the SNP, namely 
P(nextSNP, CHD|T)snp, Hi), P(nextSNP, CHD|T)snp, Hi) and P(nextSNP|T)sNP, Hi) (hence three 
rows for each hypothesis); and d) the probability ratio that measures the influence of CHD in the 
presence of the SNP, rnextSNP,CHD = P(nextSNP, CHDjDsNP, ^Ij)/P(nextSNP|DsNP 5 H^i), com¬ 
puted from the combined data of each phenotype (hence one row for each hypothesis). Column 1: 
The hypotheses. Column 2: The inferred quantities, as described above. Columns 3-5: The values 
of the inferred quantities for the combined ethnicity and CHD phenotype. 


P(nextSNP|T)sNp, CHD) = 


( 20 ) 


Using the maximum-likelihood values of pij and the posterior probability P(CHD|Dsnp, Hi) 
computed above, we compute P(nextSNP|PsNp, hfj), which we present in Table [H} 

For completion, using the Bayes theorem, we invert P(nextSNP|PsNp, CHD) to hnd the 
probability that CHD will occur given that the SNP is present in a randomly selected patient 


P(CHD|nextSNP,Fri) 


P(nextSNP|PsNP, CHD)P(CHD|Psnp, Hi) 
P(nextSNP|PsNP,i^*) 


( 21 ) 


Similarly, inverting P(nextSNP|PgNp, CHD), we hnd the probability that CHD will not 
occur given that the SNP is present in a randomly selected patient, P(CHD|nextSNP, ifj), 
which can be found simply by replacing CHD by CHD in Eqn. (21). 

In order to quantify the inhuence of CHD in the presence of the SNP, we compute the 
ratio of P(nextSNP, CHD|Psnp) to P(nextSNP|PgNp, Ffj), which gives an estimate of how 
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Hypothesis 

Probabilities 

Phenotype (j) 



Cauc CS 

Cauc MI 

Asian CS 


Po 

0.288 ±0.001 

0.296 ±0.001 

0.136 ±0.001 


P(CHD|Psnp,Po) 

(4.00 ±1.26) • 10"3 

(1.00 ±0.24) • 10-3 

(4.00 ±0.89) • 10-3 

Ho 

P(nextSNP, CPP|Psnp, Pq) 

(1.15 ±0.36) • 10-3 

(0.30 ±0.07) • 10-3 

(0.55 ±0.12) • 10-3 


P(nextSNP, CHD|Psnp, Pq) 

0.287 ± 1.018 

0.296 ± 1.605 

0.136 ±0.340 


P(nextSNP|PsNP,Po) 

0.289 ± 1.018 

0.296 ± 1.605 

0.136 ±0.340 


rnextSNP,CHD 

(4.00 ± 14.16) • 10-3 

(1.00 ±5.54) • 10-3 

(4.00 ± 10.00) • 10-3 


Pi, CHD 

0.290 ±0.001 

0.292 ±0.001 

0.151 ±0.001 


Pi, CHD 

0.287 ±0.001 

0.300 ±0.001 

0.128 ±0.001 

Hi 

P(CHD|Psnp,Pi) 

(3.34 ±7.57) • 10-3 

(0.99 ±3.18) • 10-3 

(5.11 ±6.96) • 10-3 

P(nextSNP, CPP|Psnp, Pi) 

(0.97 ±2.19) • 10-3 

(0.29 ±0.93) • 10-3 

(0.77 ± 1.05) • 10-3 


P(nextSNP, CHD|Psnp, Pi) 

0.286 ±0.542 

0.300 ± 0.947 

0.128 ±0.234 


P(nextSNP|PsNP,Pi) 

0.287 ±0.543 

0.300 ± 0.947 

0.129 ±0.234 


rnextSNP,CHD 

(3.38 ±9.96) • 10-3 

(0.96 ±4.34) • 10-3 

(6.02 ± 13.67) • 10-3 


Table III. Probabilities inferred from the combined data sets exclnding the low- 
significance data sets and the data sets with extreme results. Excluded: Elanhi et al. 
m, Dedoussis et al. H?! and Chen et al. |23j . Similarly to Table |n| to each hypothesis there 
correspond several rows consisting of: a) the parameters given by the maximum-likelihood values 
(one row in the case of Ho and two rows in the case of Hi)] b) the posterior probability for the 
occurrence of CHD (one row for each hypothesis); c) the predicted probabilities for the presence of 
the SNP (three rows for each hypothesis); and d) the probability ratio that measures the influence 
of CHD in the presence of the SNP, computed from the combined data of each phenotype (one row 
for each hypothesis). Column 1: The hypotheses. Column 2: The inferred quantities, as described 
above. Columns 3-5: The values of the inferred quantities for the combined ethnicity and CHD 
phenotype. 


much the occurrence of CHD indicates the presence of the SNP. This is also the probability 
in Eqn. (21). In the case of Hq, this ratio equals the posterior probability of occurrence of 
CHD. Conversely in the case of Hi, this ratio is different from the posterior probability of 
occurrence of CHD albeit compatible with it. The occurrence of CHD indicates the presence 
of the SNP in of order 0.1% of patients (0.1 — 0.4% in the case of Ho, 0.1 — 0.6% in the case 
of Hi), which suggests that the occurrence of CHD is not a good marker for the presence of 
the SNP. 

In order to quantify the influence of the SNP in the occurrence of (CHD, we compute the 
ratio of P(CHD|nextSNP, iP,) to P(CHD|DgNp, Pj), which gives an estimate of how much 
the presence of the SNP indicates the occurrence of CHD. This is also the probability in 
Eqns. (19, [20| . The presence of SNP indicates the occurrence of CHD in of order 0.1% of 
patients (0.141 — 0.299.% in the case of Ho, 0.158 — 0.295% in the case of Hi), which suggests 
that the presence of the SNP is not a risk factor for the emergence of CHD. 
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Hypothesis 

Probabilities 

Phenotype (j) 



Cauc CS 

Cauc MI 

Asian CS 


Po 

0.308 ±0.001 

0.271 ±0.001 

0.177 ±0.001 


P(CHD|Dsnp,77o) 

(4.00 ±1.08) • 10-3 

(1.00 ±0.20) • 10-3 

(4.00 ±0.63) • 10-3 

Ho 

P(nextSNP, CHD\Dsnp, Hq) 

(1.12 ±0.33) • 10-3 

(0.27 ±0.05) • 10-3 

(0.71 ±0.11) • 10-3 


P(nextSNP, CHD|Dsnp, Hq) 

0.306 ±0.923 

0.270 ± 1.220 

0.177 ±0.314 


P(nextSNP|DsNP,i7o) 

0.308 ±0.923 

0.271 ± 1.220 

0.177 ±0.314 


rnextSNP,CHD 

(4.00 ± 12.05) • 10-3 

(1.00 ±4.51) • 10-3 

(4.00 ± 7.11) • 10-3 


Pi, CHD 

0.306 ±0.001 

0.270 ±0.001 

0.190 ±0.001 


Pi, CHD 

0.309 ±0.001 

0.271 ±0.001 

0.165 ±0.001 

Hi 

P(CHD|Dsnp,77i) 

(3.93 ±6.97) • 10-3 

(0.92 ±2.58) -10-3 

(3.90 ±4.35) • 10-3 

P(nextSNP, CHD\Dsnp, Hi) 

(1.20 ±2.14) • 10-3 

(0.25 ± 0.70) • 10-3 

(0.74 ±0.828) • 10-3 


P(nextSNP, CHD|Dsnp, Hi) 

0.308 ±0.535 

0.271 ±0.698 

0.165 ±0.187 


P(nextSNP|DsNP,i7i) 

0.309 ±0.535 

0.271 ±0.698 

0.165 ±0.187 


rnextSNP,CHD 

(3.90 ±9.68) • 10-3 

(0.91 ±3.48) -10-3 

(4.48 ± 7.13) • 10-3 


Table IV. Probabilities inferred from the combined data sets excluding the extreme 
data sets. Excluded: Georges e tab m, Bennet el al. |16) and Hou et al. |24j . Similarly to 
Table [n| to each hypothesis there correspond several rows consisting of: a) the parameters given 
by the maximum-likelihood values (one row in the case of Hq and two rows in the case of ibi); b) 
the posterior probability for the occurrence of CHD (one row for each hypothesis); c) the predicted 
probabilities for the presence of the SNP (three rows for each hypothesis); and d) the probability 
ratio that measures the influence of CHD in the presence of the SNP, computed from the combined 
data of each phenotype (one row for each hypothesis). Column 1: The hypotheses. Column 2: The 
inferred quantities, as described above. Columns 3-5: The values of the inferred quantities for the 
combined ethnicity and CHD phenotype. 

B. Sensitivity of the results 

To test the robustness of this meta-analysis, we conceive two tests of the sensitivity of 
the results, namely to low-signihcance data sets, to data sets with extreme results and to 
extreme data sets. 

To test the sensitivity of the results to low-signihcance data sets, we exclude the data sets 
with comparatively small sample sizes for the same CHD phenotype, namely the study by 
Elanhi et al. [10] and the study by Chen et al. [23|, from the combination. We also exclude 
the studies with extreme results (i.e., the studies with the largest Bayes factor), namely 
the study in Dedoussis et al. ng. We recompute both the Bayes factors (Table and the 
probabilities of CHD (Table [Hi] ). We observe that the Bayes factor in the new combination 
changes by 18%, —38% and 24%, respectively for the CS Caucasian, the MI Caucasian and 
the CS Asian population. The inferred parameters and probabilities vary by —6 to 6%, —5 
to 2%, and —1 to 4%, respectively for the CS Caucasian, the MI Caucasian and the CS 
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Asian population. The largest difference is observed for the CS Caucasian population due 
to the exclusion of the study by Elanhi et ah [10]. The exclusion of the study by Dedoussis 
et al. na from the MI Caucasian population causes predominantly negative differences. 

To test the sensitivity of the results to extreme data sets, we exclude the data sets 
with comparatively large samples sizes for the same CHD phenotype, namely the study by 
Georges et al. na, the study by Bennet el al. [16] and the study by Hon et al. 121. from 
the combination. These are also the studies with the smallest Bayes factor for each CHD 
phenotype. We recompute both the Bayes factors (Table and the probabilities of CHD 
(Table [Iv]). We observe that the Bayes factor in the new combination changes by 3%, —19% 
and 32%, respectively for the CS Caucasian, the MI Caucasian and the CS Asian population. 
The inferred parameters and probabilities vary by —20 to —1%, 5 to 11%, and —26 to 25%, 
respectively for the CS Caucasian, the MI Caucasian and the CS Asian population. The 
largest difference is observed for the CS Asian population due to the exclusion of the study 
by Hon et al. [21|. The exclusion of the study by Georges et al. [T2| from the CS Caucasian 
population causes predominantly negative differences. 

In both tests, the differences in the Bayes factor leave the result of the hypothesis testings 
unchanged, while the differences in the inferred parameters and probabilities also leave the 
conclusions unchanged. We thus infer that this formalism is largely insensitive to a) low- 
signihcante data sets combined with data with extreme results, and to b) extreme data sets, 
which renders this formalism signihcantly robust. 


IV. CONCLUSIONS 

In this manuscript we investigated the correlation between the occurrence of CHD with 
the presence of the -308 TNF-a SNP from hfteen independent data sets on Caucasians for 
two CHD phenotypes and from Eve independent data sets on Asian for one CHD phenotype. 
We showed how to combine independent data sets and to infer correlations using Bayesian 
analysis. 

Hypothesis testing on the combined data sets indicated that there is no evidence for a cor¬ 
relation between the occurrence of CHD and the presence of the SNP, either on Caucasians 
or on Asians. This result agrees with previous meta-analyses EE]. As a measure of an 
eventual correlation, we computed the conditional probability of CHD given the SNP, nor¬ 
malized to the probability that CHD occurs. Ending that the presence of the SNP indicates 
the occurrence of CHD in of order 0.1% of patients, i.e. in of order 0.1% of the occurrence of 
CHD is concomitant with the presence of SNP. We also tested the sensitivity of the results 
by excluding selected data sets from the meta-analysis. We found changes of order 10%, 
leaving the results unchanged and thus establishing this formalism as signihcantly robust. 

An interesting extension of this work for the sake of completion is the inclusion of stud¬ 
ies referring to Africans and Indians which are currently too few to extract convincing results. 
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